AITopics | ai accelerator

Collaborating Authors

ai accelerator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Jalapeño is the first AI chip from OpenAI and Broadcom

EngadgetJun-24-2026, 20:10:40 GMT

OpenAI and Broadcom have unveiled the design for Jalapeño, their first jointly-made chip. The pair of companies announced plans to collaborate on a making a custom AI accelerator in October 2025. In its blog post today, OpenAI called Jalapeño its first Intelligence Processor: an accelerator architected around OpenAI's vision for the future of LLM inference. In other words, the processor is designed to run its large language models. The AI company claims that so far, Jalapeño is offering performance per watt substantially better than current state-of-the-art in chip technology.

large language model, machine learning, natural language, (10 more...)

Engadget

Genre: Press Release (0.58)

Industry: Leisure & Entertainment > Games > Computer Games (0.76)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators Taesik Gong

Neural Information Processing SystemsFeb-12-2026, 19:33:23 GMT

Tiny machine learning (TinyML) aims to run ML models on small devices and is increasingly favored for its enhanced privacy, reduced latency, and low cost.

accelerator, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning

Khalil, Khurram, Khaliq, Muhammad Mahad, Hoque, Khaza Anuarul

arXiv.org Artificial IntelligenceDec-11-2025

Abstract--The massive scale of modern AI accelerators presents critical challenges to traditional fault assessment methodologies, which face prohibitive computational costs and provide poor coverage of critical failure modes. This paper introduces RIFT (Reinforcement Learning-guided Intelligent Fault T argeting), a scalable framework that automates the discovery of minimal, high-impact fault scenarios for efficient design-time fault assessment. RIFT transforms the complex search for worst-case faults into a sequential decision-making problem, combining hybrid sensitivity analysis for search space pruning with reinforcement learning to intelligently generate minimal, high-impact test suites. Evaluated on billion-parameter Large Language Model (LLM) workloads using NVIDIA A100 GPUs, RIFT achieves a 2.2 fault assessment speedup over evolutionary methods and reduces the required test vector volume by over 99% compared to random fault injection, all while achieving superior fault coverage. The proposed framework also provides actionable data to enable intelligent hardware protection strategies, demonstrating that RIFT -guided selective error correction code provides a 12.8 improvement in cost-effectiveness (coverage per unit area) compared to uniform triple modular redundancy protection. RIFT automatically generates UVM-compliant verification artifacts, ensuring its findings are directly actionable and integrable into commercial RTL verification workflows. The recent advent of Large Language Models (LLMs) with hundreds of billions of parameters has had a transformative impact on computing, but has also introduced unprecedented computational demands [1].

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.09829

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Zhou, Zhongchun, Lai, Chengtao, Gu, Yuhang, Zhang, Wei

arXiv.org Artificial IntelligenceDec-9-2025

Abstract--The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and their asynchronous management, we investigate the opposite point of the design spectrum: a multi-core AI accelerator equipped with a shared system-level cache and application-aware management policies, which keeps the programming effort modest. Our approach exploits dataflow information available in the software stack to guide cache replacement (including dead-block prediction), in concert with bypass decisions and mechanisms that alleviate cache thrashing. We assess the proposal using a cycle-accurate simulator and observe substantial performance gains (up to 1.80x speedup) compared with conventional cache architectures. In addition, we build and validate an analytical model that takes into account the actual overlapping behaviors to extend the measurement results of our policies to real-world larger-scale workloads. Experiment results show that when functioning together, our bypassing and thrashing mitigation strategies can handle scenarios both with and without inter-core data sharing and achieve remarkable speedups. Finally, we implement the design in RTL and the area of our design is 0.064mm Our findings explore the potential of the shared cache design to assist the development of future AI accelerator systems. ITH the advent of the artificial intelligence (AI) era, the demand for AI-tailored hardware has surged across various environments, from data centers to embedded systems. A preliminary version of this paper appeared in the proceedings of ICS 2024. Z. Zhou and C. Lai contributed equally to this work. Z. Zhou and C. Lai are with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (e-mail: zzhouch@connect.ust.hk; Gu is with the School of Electronic Science and Engineering, Southeast University, Nanjing, Jiangsu, China W . Zhang (corresponding author) is with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (e-mail: eeweiz@ust.hk). Personal use of this material is permitted. These accelerators span a broad spectrum, from power-efficient devices to those designed for high computational throughput [34]. AI accelerators, compared with Graphics Processing Units (GPUs), can be optimized for AI applications and tailored for specific scenarios, such as pre-defined neural network (NN) computation graphs, operator types, certain data precision, and given power budgets. Since they are often used in scenarios where the execution graph is known during compilation, they typically employ software-controlled scratchpad memories (SPMs) as the on-chip storage.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.07312

Country: Asia > China > Hong Kong > Kowloon (0.44)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.87)
Semiconductors & Electronics (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Improving AI Efficiency in Data Centres by Power Dynamic Response

Marinoni, Andrea, Shivareddy, Sai, Lio', Pietro, Lin, Weisi, Cambria, Erik, Grey, Clare

arXiv.org Artificial IntelligenceOct-14-2025

The steady growth of artificial intelligence (AI) has accelerated in the recent years, facilitated by the development of sophisticated models such as large language models and foundation models. Ensuring robust and reliable power infrastructures is fundamental to take advantage of the full potential of AI. However, AI data centres are extremely hungry for power, putting the problem of their power management in the spotlight, especially with respect to their impact on environment and sustainable development. In this work, we investigate the capacity and limits of solutions based on an innovative approach for the power management of AI data centres, i.e., making part of the input power as dynamic as the power used for data-computing functions. The performance of passive and active devices are quantified and compared in terms of computational gain, energy efficiency, reduction of capital expenditure, and management costs by analysing power trends from multiple data platforms worldwide. This strategy, which identifies a paradigm shift in the AI data centre power management, has the potential to strongly improve the sustainability of AI hyperscalers, enhancing their footprint on environmental, financial, and societal fields.

data centre, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.11119

Country: Europe > United Kingdom (0.29)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.48)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

4da8ae12143b0f8f7505498f39ecb1f8-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 01:51:34 GMT

accelerator, ai accelerator, processor, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies

Sharma, Amit

arXiv.org Artificial IntelligenceJun-10-2025

This paper presents the first comprehensive cross - architectural performance analysis of contemporary AI accelerators designed for LLM inference, introducing a novel workload - centric evaluation methodology that quantifies architectural fitness across operational regimes. We provide the first systematic comparison of memory hierarchies, compute architectures, and interconnect strategies across the full spectrum of commercial accelerators, from GPU - based designs to specialized wafer - scale engines. Our analysis reveals that no single architecture dominates across all workload categories, with performance variations of up to 3.7 between architectures depending on batch size and sequence length. We quantitatively evaluate four primary scaling strategies for trillion - parameter models, demonstrating that expert parallelism delivers the best parameter - to - compute ratio (8.4) but introduces 2.1 latency variance compared to tensor parallelism. This work provides system designers with actionable insights for accelerator selection based on workload characteristics, while identifying key architectural gaps in current designs that will shape future hardware development.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.00008

Genre:

Research Report (0.82)
Overview (0.68)

Industry: Information Technology (0.71)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

High-Throughput LLM inference on Heterogeneous Clusters

Xiong, Yi, Huang, Jinqi, Huang, Wenjie, Yu, Xuebing, Li, Entong, Ning, Zhixiong, Zhou, Jinhua, Zeng, Li, Chen, Xin

arXiv.org Artificial IntelligenceApr-23-2025

Nowadays, many companies possess various types of AI accelerators, forming heterogeneous clusters. Efficiently leveraging these clusters for high-throughput large language model (LLM) inference services can significantly reduce costs and expedite task processing. However, LLM inference on heterogeneous clusters presents two main challenges. Firstly, different deployment configurations can result in vastly different performance. The number of possible configurations is large, and evaluating the effectiveness of a specific setup is complex. Thus, finding an optimal configuration is not an easy task. Secondly, LLM inference instances within a heterogeneous cluster possess varying processing capacities, leading to different processing speeds for handling inference requests. Evaluating these capacities and designing a request scheduling algorithm that fully maximizes the potential of each instance is challenging. In this paper, we propose a high-throughput inference service system on heterogeneous clusters. First, the deployment configuration is optimized by modeling the resource amount and expected throughput and using the exhaustive search method. Second, a novel mechanism is proposed to schedule requests among instances, which fully considers the different processing capabilities of various instances. Extensive experiments show that the proposed scheduler improves throughput by 122.5% and 33.6% on two heterogeneous clusters, respectively.

large language model, machine learning, throughput, (21 more...)

arXiv.org Artificial Intelligence

2504.15303

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Runtime Detection of Adversarial Attacks in AI Accelerators Using Performance Counters

Rahaman, Habibur, Chatterjee, Atri, Bhunia, Swarup

arXiv.org Artificial IntelligenceMar-10-2025

Rapid adoption of AI technologies raises several major security concerns, including the risks of adversarial perturbations, which threaten the confidentiality and integrity of AI applications. Protecting AI hardware from misuse and diverse security threats is a challenging task. To address this challenge, we propose SAMURAI, a novel framework for safeguarding against malicious usage of AI hardware and its resilience to attacks. SAMURAI introduces an AI Performance Counter (APC) for tracking dynamic behavior of an AI model coupled with an on-chip Machine Learning (ML) analysis engine, known as TANTO (Trained Anomaly Inspection Through Trace Observation). APC records the runtime profile of the low-level hardware events of different AI operations. Subsequently, the summary information recorded by the APC is processed by TANTO to efficiently identify potential security breaches and ensure secure, responsible use of AI. SAMURAI enables real-time detection of security threats and misuse without relying on traditional software-based solutions that require model integration. Experimental results demonstrate that SAMURAI achieves up to 97% accuracy in detecting adversarial attacks with moderate overhead on various AI models, significantly outperforming conventional software-based approaches. It enhances security and regulatory compliance, providing a comprehensive solution for safeguarding AI against emergent threats.

apc metric, opération, tanto, (12 more...)

arXiv.org Artificial Intelligence

2503.07568

Country: North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators

Irabor, Emmanuel, Musavi, Mariam, Das, Abhijit, Abadal, Sergi

arXiv.org Artificial IntelligenceJan-29-2025

The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflicts with the need for scalable and versatile architectures capable of catering to the needs of the evolving and heterogeneous pool of Machine Learning (ML) models in the literature. In this context, multi-chiplet architectures assembling multiple (perhaps heterogeneous) accelerators are an appealing option that is unfortunately hindered by the still rigid and inefficient chip-to-chip interconnects. In this paper, we explore the potential of wireless technology as a complement to existing wired interconnects in this multi-chiplet approach. Using an evaluation framework from the state-of-the-art, we show that wireless interconnects can lead to speedups of 10% on average and 20% maximum. We also highlight the importance of load balancing between the wired and wireless interconnects, which will be further explored in future work.

accelerator, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.17567

Country:

Asia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.82)

Industry: Energy (0.35)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback